Search CORE

204 research outputs found

Career Outcomes of International Master\u27s Recipients from Chinese Institutions: A Study of Students From Three ASEAN States

Author: Wang Yanhao
Publication venue: eRepository @ Seton Hall
Publication date: 12/05/2022
Field of study

As the third largest destination country for international postsecondary students, China has received nearly 500,000 international students, and more than 20% of them are from ASEAN member states (Department of International Cooperation and Exchanges, 2019). Compared to students from Western society, most ASEAN students are from developing countries and may have stronger needs to generate career benefits via studying abroad. ASEAN students in China and their career outcomes, however, have been always overlooked in existing research. In this qualitative study, I applied Human Capital Theory (HCT) and Neo Racism Theory (NRT) to investigate the career outcomes of graduated ASEAN students who obtain a master’s degree of Chinese Language from mainland China. I conducted in-depth, semi-structured interview with 16 participants who were born in Malaysia, Myanmar, and Thailand, investigating their perceptions on the benefits and costs of studying in China, factors impacting their career outcomes, and suggestions on Chinese government and universities. I also explored how participants’ experience and perceptions vary across sending countries. Participants recognized that studying in China can improve their employability by enhancing their technical skills, language skills, and soft skills. Establishing professional networks, holding a master’s degree granted by Chinese universities, and learning from the workplace culture in China can also contribute to their professional development in both China and their home countries. Based on participants’ perceptions, the influential factors for career outcomes can be categorized into international/national, social/institutional, and personal/family factors. China-ASEAN economic cooperation has created opportunities from these participants who have studied in China and know China well. China’s unclear policies on international students, however, have confused participants and caused barriers when they seek jobs in China. At the social level, some participants have experienced discrimination against non-White races, which discouraged them from remaining, but most participants were impressed by China’s development and wanted to work in China. Participants improved their employability via courses offered in their programs, and those who graduated from high reputation universities or universities that have cooperation with ASEAN states tended to obtain better career opportunities. Most Chinese universities, however, adopt a segregation policy, dividing Chinese and international students into different classes and dorms. Participants, therefore, lack opportunities to interact with local students and build local network. Moreover, many advisors in China were limited by their knowledge on ASEAN states and cannot offer necessary help on participants’ career development. At the personal and family level, personal experience is vital in jo-seeking, and family responsibility and parents’ expectations have pulled many participants back to their sending countries. Most participants had no suggestions for Chinese government and institutions, although some expected more fair scholarship policies and more clear immigration regulations. The results partly echo HRT and NRT but challenged some arguments as well. This research remains scholars to be more cautious when applying West-originated theories in Asia, and factors like politics, culture, and economic development in the studied areas should be considered. This study also generated a model to show how influential factors interact with each other and impact participants’ career outcomes

Seton Hall University Libraries

Graph Summarization via Node Grouping: A Spectral Algorithm

Author: Mathioudakis Michael
Merchant Arpit
Wang Yanhao
Publication venue: Association for Computing Machinery
Publication date: 08/11/2022
Field of study

Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Streaming Algorithms for Diversity Maximization with Fairness Constraints

Author: Fabbri Francesco
Mathioudakis Michael
Wang Yanhao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/07/2022
Field of study

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set

X

n

elements, it asks to select a subset

S

k \ll n

elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in

S

. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset

S

that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set

X

is partitioned into

m

disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset

S

contains

k_i

elements from each group

i \in [1,m]

. A streaming algorithm should process

X

sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is

\frac{1-\varepsilon}{4}

-approximate and specific for

m=2

, where

\varepsilon \in (0,1)

, and the second of which achieves a

\frac{1-\varepsilon}{3m+2}

-approximation for an arbitrary

m

. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.Comment: 13 pages, 11 figures; published in ICDE 202

arXiv.org e-Print Archive

Spectral Normalized-Cut Graph Partitioning with Fairness Constraints

Author: Li Jia
Merchant Arpit
Wang Yanhao
Publication venue
Publication date: 22/07/2023
Field of study

Normalized-cut graph partitioning aims to divide the set of nodes in a graph into

k

disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we consider a fair variant of the partitioning problem wherein nodes are characterized by a categorical sensitive attribute (e.g., gender or race) indicating membership to different demographic groups. Our goal is to ensure that each group is approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we propose a two-phase spectral algorithm called FNM. In the first phase, we add an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we design a rounding scheme to produce

k

clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrate the superior performance of FNM compared with three baseline methods.Comment: 17 pages, 7 figures, accepted to the 26th European Conference on Artificial Intelligence (ECAI 2023

arXiv.org e-Print Archive

Fair and Representative Subset Selection from Data Streams

Author: Fabbri Francesco
Mathioudakis Michael
Wang Yanhao
Publication venue: ACM
Publication date: 01/01/2021
Field of study

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a (1/2-ε)-approximation algorithm that requires O((1/ε) log(k/ε)) passes over the stream for any constant ε>0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of (1/2-ε) when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Author: Bonchi Francesco
Li Yuchen
Wang Yanhao
Wang Ying
Publication venue
Publication date: 02/11/2022
Field of study

Submodular function maximization is central in numerous data science applications, including data summarization, influence maximization, and recommendation. In many of these problems, our goal is to find a solution that maximizes the \emph{average} of the utilities for all users, each measured by a monotone submodular function. When the population of users is composed of several demographic groups, another critical problem is whether the utility is fairly distributed across groups. In the context of submodular optimization, we seek to improve the welfare of the \emph{least well-off} group, i.e., to maximize the minimum utility for any group, to ensure fairness. Although the \emph{utility} and \emph{fairness} objectives are both desirable, they might contradict each other, and, to our knowledge, little attention has been paid to optimizing them jointly. In this paper, we propose a novel problem called \emph{Bicriteria Submodular Maximization} (BSM) to strike a balance between utility and fairness. Specifically, it requires finding a fixed-size solution to maximize the utility function, subject to the value of the fairness function not being below a threshold. Since BSM is inapproximable within any constant factor in general, we propose efficient data-dependent approximation algorithms for BSM by converting it into other submodular optimization problems and utilizing existing algorithms for the converted problems to obtain solutions to BSM. Using real-world and synthetic datasets, we showcase applications of our framework in three submodular maximization problems, namely maximum coverage, influence maximization, and facility location.Comment: 13 pages, 7 figures, under revie

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Towards an Instance-Optimal Z-Index

Author: Mathioudakis Michael
Pai Sachith Gopalakrishna
Wang Yanhao
Publication venue
Publication date: 05/09/2022
Field of study

We present preliminary results on instance-optimal variants of the Z-index, a well-known spatial index that makes use of the Z-order curve. Unlike the base Z-index, the variants we propose aim to adapt to the data and range-query workloads of the given setting. Specifically, we provide an optimal algorithm that builds a Z-index that minimizes the expected number of retrieved data points for the given data and query workload. Moreover, since the optimal algorithm requires supra-linear running time, we additionally propose efficient heuristic algorithms to use in its place. Our experiments evaluate the performance of the resultant Z-indexes.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Coresets for minimum enclosing balls over sliding windows

Author: LI Yuchen
TAN Kian-Lee
WANG Yanhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2019
Field of study

\emph{Coresets} are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.Comment: 28 pages, 10 figures, to appear in The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '19

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University